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STOCHASTIC COLLOCATION ON UNSTRUCTURED MULTIVARIATE 

MESHES 

AKIL NARAYAN AND TAG ZHOU 


Abstract. Collocation has become a standard tool for approximation of parameterized 
sys- terns in the uncertainty quantihcation (UQ) community. Techniques for least-squares 
regularization, compressive sampling recovery, and interpolatory reconstruction are be¬ 
coming standard tools used in a variety of applications. Selection of a collocation mesh 
is frequently a challenge, but methods that construct geometrically aAIJunstructuredaAl 
collocation meshes have shown great potential due to attractive theoretical properties and 
direct, simple generation and implementation. We investigate properties of these meshes, 
presenting stability and accuracy results that can be used as guides for generating stochastic 
collocation grids in multiple dimensions. 

1. Introduction 

The field of uncertainty quantification has enjoyed much attention in recent years as 
theoreticians and practitioners have tackled problems in the diverse areas of stochastic 
analysis, exascale applied computing, high-dimensional approximation, and Bayesian learning. 
The advent of high-performance computing has led to an increasing demand for efficiency 
and accuracy in predictive capabilities in computational models. 

One of the persistent problems in Uncertainty Quantification (UQ) focuses on parameterized 
approximation to differential systems: Let u be a state variable for a system that is the 
solution to a physical model 

(1) £(u; t, X, uj) = 0, 

Above X G for p = 1, 2,3 is a spatial variable, t G IR is a temporal variable, and a; G is 
a probabilistic event that encodes randomness on a complete probability space {Q.,J-,V). We 
assume that the model Q dehnes a map uj i—)• u(t,x,uj) with u : Q. ^ B that is well-posed 
almost surely for some appropriate space B of (x, t)-dependent functions. 

The operator C may represent any mathematical model of interest; examples that are 
popular in modern applied communities are elliptic partial differential equations, systems 
of time-dependent differential equations, parametric inverse problems, and data-driven 
optimization; e.g., m Eoi isa isg dzi ESI [37]. The system dehned by the operator L may 
include boundary value constraints, initial value prescriptions, physical domain variability, 
or any combination of these iziiMiisiiiTa]. 

The sought system response u{t,x,uj) is random/stochastic, given by the solution to Q. 
The stochastic dependence in Q given by the event uj is frequently approximated by a 
d-dimensional random variable Z{uj). In some cases this parameterization of randomness 
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is straightforward: e.g., in a Bayesian framework when ignorance about the true value of a 
vector of parameters is modeled by treating this parameter set as a random vector in Q. In 
contrast, it is common in models for an infinite-dimensional random field to contribute to 
the stochasticity, and in these cases parameterization is frequently accomplished by some 
finite-dimensional truncation procedure, e.g., via the Karhunen-Loeve expansion |441188) . 
and this reduces the stochastic dependence in to dependence on a random vector Z. In 
either case, a modeler usually wants to take d = dim Z as large as possible to encode more 
of the random variability in the model. 

Under an assumption of model validity, the larger the stochastic truncation dimension 
d, the more accurate the resulting approximation. (Even when model validity is suspect, 
one can devise metamodeling procedures to capture model form error |50].) Therefore, it 
is mathematically desirable to take d as large as possible. We rewrite 0 to emphasize 
dependence on the d-dimensional random variable Z: 

(2) C{u; t, X, Z) = 0. 

In this article we are ultimately interested in approximating u{x, t, Z) or some functional of 
it, and concentrate on the task of approximating tt as a function of Z. This is the standard 
modus operandi for non-intrusive methods. 

A major challenge for modern uncertainty quantification is the curse of dimensionality. 
Coined by Richard Bellman [3|, this refers to the exponentially-increasing computational cost 
of resolving variability with respect to an increasing number of parameters. The trade-off that 
one frequently makes is that a large d induces an accurate stochastic truncation, but results 
in a computationally challenging problem since u depends on a d-dimensional parameter. 

When Z is high-dimensional, model reduction techniques such as proper orthogonal 
decomposition methods |36l |2T] or reduced basis methods |66[ l69] are useful. In many 
situations these methods are powerful in their own right, robustly addressing problems in 
the scientific community; however, many implementations of these approaches are intrusive, 
meaning that significant rewrite of large legacy codebases is required. The focus of this 
paper is directed towards a different approach: non-intrusive response construction using 
multivariate polynomial collocation. “Non-intrusive" effectively means that existing black-box 
tools can be used in their current form. In particular we will focus on weighted methods, 
which are of concern for stochastic collocation methods. Stochastic collocation entails 
polynomial approximation in parameter (Z) space using either interpolation or regularized 
collocation approximation. These approaches have become extremely popular [8911871120] for 
their efficiency and effectiveness. In many situations of interest, polynomial approximations 
converge to the true response exponentially with respect to the polynomial degree. 

In stochastic collocation, a polynomial surrogate that predicts variability in parameter 
space is constructed from point-evaluations of the model response ([^ at an ensemble of fixed 
parameter values G D] we will call this ensemble of parameter values a grid or mesh, 
or a collection of nodes. While much work exists on geometrically structured meshes (e.g. 
tensor-product lattices or sparse grids), we will focus on unstructured meshes, which we 
believe is fertile ground deserving of much attention. Our use of terms ‘structured’ versus 
‘unstructured’ refers to visual appearance of a lattice or geometric regularity of the mesh 
distribution in multivariate space. Obviously use of such a term is a subjective matter, and 
our goal is not to taxonomically classify collocation methods as structured or unstructured. 
Instead, the goal of this paper is to highlight some recent collocation strategies that distribute 
collocation nodes in an apparently unstructured manner; many of these recently developed 
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methods produce approximation meshes that have attractive theoretical and computational 
properties. 


2. Generalized polynomial Chaos 

The generalized Polynomial Chaos method (gPC) )9n) is essentially the strategy of ap¬ 
proximating the Z-dependence of u{x, t, Z) from ([^ by a Z-polynomial. Let Z be a random 
variable with density function p{z), so that P[Z € A] = f^p(z)dz for any Borel set A 
contained in the domain of Z, A <Z D C. 11“^. 

Hereafter, we will subsume t-dependence into the variable x and write u{x, Z) = u{x, t, Z). 
A gPC method proceeds by making the ansatz that the variability of u in the random variable 
Z is described by a polynomial: 


N 

(3) un{x,Z) = y~]cn(a:)(/»n(Z), 

n=l 

for a prescribed multivariate polynomial basis (pn and unknown coefficient functions Cn{x). 
The gPC approach is to choose the basis cpn to be the family of Lp-orthogonal polynomials, 

Fi(l)n{Z)(j)m{Z) = / (j)n{z)(j)m{z:)p{z)dz = 6n,m 

with 5n,m the Kronecker delta function. We also make the assumption that these polynomials 
are complete in the corresponding p-weighted space; this is satished if, for example, p is 
continuous and decays at least as fast as exp(— |z|) as \z\ —)> oo. More intricate conditions 
can be found in m- 

In many cases of practical interest, it is natural to assume that the components of Z are 
mutually independent. In this case, the multivariate functions cpn decompose into products 
of univariate functions. If the components of Z are independent, then D = n^=i Ij for 

univariate intervals Ij C IR and p{z) = Pj with z = ..., the 

components of z. Then the multivariate orthogonal polynomials are products of univariate 
orthogonal polynomials: 



We have introduced multi-index notation: a = (ai,... , 0 ^) G Ng is a mnlti-index, 2 ;“ = 
and \oi\ = Depending on the sitnation, we will alternate between 

integer and multi-index notation for the polynomial ansatz: 

N 

Un{x,Z) = ^C„(x)0n(Z) = ^da{x)pa{Z), 

n=l qsA 

where A C Ng is an index set with cardinality |A| = N. Any convenient mapping between 
1,..., Ai and A may be used. The choice of A dehnes the polynomial approximation space. 


4 


A. Narayan AND T. Zhou 


Tensor product space indices 


25 ■ 


20 ■ 


c 15 ■ 

« I 


10 ■ 


5 ■ 


0 Wmmmmwmmmmwmmmmwmmmmwmmmmw 

0 5 10 15 20 25 

Ql 


Total degree space indices 


25 


20 ■ 


c. 15 ■ 

« I 


10 ■ 


5 ■ 


oU ■■■■■■■■■■■■■■■■■■■■■■■■■ 

0 5 10 15 20 25 

Oi 


Hyperbolic cross space indices 

25 ■ 

20 J 

15 j 

10 JJ 
5 iJ 

0 Ills Ills ■!!■■■■■■■■■■■■■ 

0 5 10 15 20 25 

Q-l 


Figure 1. Indices associated with the tensor product set (left), the 
total degree set (center), and the hyperbolic cross set A^^ (right). All 
sets have dimension d = 2 and degree k = 25. 


Some classical polynomial spaces to which ui\f may belong are the tensor-product, total 

Pfc"* = span I a E A^fc} , 
rfc^ = span{z"|aE A^fc} 

= span { 2 " I a E A|;^} 

We have chosen particular dehnitions of these spaces above, but there are generalizations. 
E.g., dimensional anisotropy can be used to ‘bias’ the index set toward more important 
dimensions, or a different P’ norm (0 < p < 1) can be placed on index space to tailor the 
hyperbolic cross spaces. The dimensions of and are 

pi & dim Pi ^{k + lf. 

The dimension of has the following upper bound |61) : 

hi = dim Hi < [{k + 1)(1 + log(A: + . 

For index sets A that do not fall into the categories defined by Q, we will use the notation 
P{A) to denote the corresponding polynomial space. 

For the index sets Q, we immediately see the curse of dimensionality: the dimensions of 
Tji and P^ increase exponentially with d, although tf is smaller than pf. The indices in the 
sets A^^, AJ^, and A^^ are graphically plotted in Figure 1 for d = 2 and polynomial degree 
k = 25. 

This highlights a challenge with gPC in high dimensions: the number of degrees of freedom 
required to resolve highly oscillatory structure grows exponentially with dimension. (Indeed, 
this is a challenge for any non-adapted multivariate approximation scheme.) In the next 
section we narrow our focus to collocation schemes. 
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2.1. Stochastic Collocation. The determination oicn{x) in (|^ is the main difficulty, and 
one way to proceed is to ask that at some predetermined realizations {zm}^=i of the 
ansatz match the actual response: 

(6) UNix,Zm) - u{x,Zm), m = 

where the approximation ~ and the relation between N and M are discussed later in this 
section. (We will allow the ensemble of realizations to be randomly generated in some 
cases, but we will continue to use lowercase notation Zm to denote specific samples of the 
random variable Z.) This is a stochastic collocation approach and is non-intrusive: we 
need to compute u{x,Zm), but this is accomplished by simply setting Z = in (|^ and 
solving M realizations of the model equation. Therefore, existing deterministic solvers can be 
utilized. This ability to reuse existing solvers is one of the major strengths of non-intrusive 
(in particular, collocation) strategies. 

In contrast, intrusive methods generally require a nontrivial mathematical reformulation 
of the model ([^, and usually necessitate novel algorithm development. One popular intrusive 
method for gPC is the stochastic Galerkin approach, where ujsf is specified by imposing that 
some probabilistic moments of the model equation residual vanish. Because these moment 
equations are coupled and are a novel formulation compared to the original type of equation, 
existing deterministic solvers of (|^ cannot be used. The advantage of intrusive methods 
compared to non-intrusive methods is that one can usually make more formal mathematical 
statements about convergence of ujsf with an intrusive formulation. For details of intrusive 
methods, see [44) . Although intrusive methods are advantageous in many situations, in this 
paper we only consider non-intrusive collocation approaches. 

The focus of this paper discusses how to enforce the collocation conditions (|^. We 
investigate three situations: 

• M > N: when there are more constraints than degrees of freedom, we employ 
regression to attain a solution 

• M < N: with more degrees of freedom than constraints, we may seek sparse solutions 
and appeal to the theory of compressive sampling 

• M = N: with an equal number of linear constraints and degrees of freedom, we may 
enforce interpolation 

Examples of sampling strategies that we consider as ‘structured’ are tensor-product 
constructions and sparse grid constructions. The former is quickly seen as infeasible for large 
dimensionality d. If we have an m-point one-dimensional grid (such as a Clenshaw-Curtis 
grid), then an isotropic tensorization has M = samples; this dependence on d is usually 
not computationally acceptable. 

Sparse grids are unions of anistropically-tensorized grids, and have proven very effective 
[651 ca isiia at approximating high-dimensional problems. However, the sparse gridsaAZ 
adherence to rigid and predictable layouts have the potential weakness of aAYmiss- ingaAZ 
important features that do not line up with cardinal directions or coordinates in multivariate 
space, especially earlier in the computation when adaptive methods are seeded with isotropic 
grids. We believe that unstructured sample designs have the potential to mitigate these 
shortcomings. 

2.2. Multivariate collocation. We introduce notation that is used throughout this article. 
In particular, we reserve the notation N and M to denote the number of terms in the 
expansion ([^ and the number of collocation points in the ensemble (|^ , respectively. We will 
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also use k to denote the polynomial degree of an index set A when this index set corresponds 
to one of the choices Q. 

The main goal is to compute the coefficient functions Cn from and this can be decom¬ 
posed into smaller linear algebra problems. In practice the realization u(x,Zm) (as a solution 
to ([^ with Z = Zm) usually computed via a spatial discretization as an ii-dimensional 
vector \i{zm)- The type of spatial discretization to which \i{zm) corresponds typically does 
not influence Z-approximation strategies if non-intrusive methods are considered. 

Let A be an M x N matrix with entries (A)m,n = (pnixm) with the orthogonal gPC 
basis. Then the conditions ([^ can be written as 

(A®I)(C) «U, 

where C is an RN x 1 vector with entries Cr,n = Cn,r ordered lexicographically in (r, n), U is 
an RM x 1 vector with entries 17r,m = (u(.2m))r ordered lexicographically, and I is the Rx R 
identity matrix. Then clearly the above system can be rewritten as the following decoupled 
series of systems: 

(7) ACr^Ur, r = l,...,R, 

where has entries (c^)„ = Cn,r, and has entries (u^)^ = (u(xm))j,- 

Thus the stochastic collocation solution to ([^ under the polynomial ansatz (|^ with 
conditions ([^ is given by the solution to (Q. Therefore, as is common in the stochastic 
collocation community, we focus entirely on solving the model problem 

(8) Ac Ri u, 

for the coefficient vector c and given data u obtained from non-intrusive queries of the mode 
problem ([^. 

The three major approximations for (|^ that we consider are 

(1) Regression/regularization: we enforce ([^ in the least-squares sense over the nodal 
array 

(2) Interpolation: we enforce exact equality in Q 

(3) Compressive sensing: under the assumption that the exact solution coefficients Ck,n 
are sparse in n, we attempt to recover a sparse solution c to Q. 

Our focus in this paper is on presentation on types of ’unstructured’, high-order collocation 
approximation methods. We will introduce the above approximation methods and present 
some recently-developed unstructured mesh designs that complement each method. Our 
discussion revolves on the following considerations: 

(1) For a given mesh, how does the stability of the problem scale with the approximation 
order N and the sample size M? 

(2) For a given mesh, how is the accuracy of the reconstruction affected by the approxi¬ 
mation order N and the sample size M? 

(3) Can we find or generate a mesh for which the approximation problem has ‘nice’ 
stability or accuracy properties? 

These concerns undergird most of our future discussion. 

3. Regression 

Least-squares regression is one of the most classical approaches to collocation approxima¬ 
tion, with vast literature for recovery with noisy data I810115]. In UQ applications, the 
uncertainty of the input parameters can be treated as random variables Z. It is therefore 
reasonable to assume that all the stochasticity/uncertainty in the model is described by Z. 
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Thus, we shall focus on the least-squares regression with noise-free data u. Such UQ-focused 
least-squares regression (sometimes called point collocation |47| 1 has been widely explored 

[731 [551 |99l [301 E]. 

For the least-squares approach we consider here, the gPC coefficients c„ are estimated 
by minimizing the mean square error of the response approximation, i.e., one finds the 
least-squares solution ujv that satishes 

M 

(9) UN ■= argmin ^ {p{zm) - u{zm)f , 

peP(A) 

where the index set A may be any general index set, e.g. or T^. For convenience we also 
introduce the discrete inner product on parameter space 


( 10 ) 




1 

M 


M 

m=l 


1 /2 

and the corresponding discrete norm ||m||m = {u,u)]f^ . 

The formulation (|^ is equivalent to requiring that the model problem ([^ is satisfied in 
the following algebraic sense 


( 11 ) 


c = argmin 

veK^v 


I Av — u| 


Alternatively, the solution to the least-squares problem (11) can also be computed by solving 
an N X N system (the “normal equations"): 


(12) 

Ac = u 

with 


(13a) 

A:= ATa= (M (</.„, 

(13b) 

u := A^u = (M {u, (pa))aeA 


We will describe three kinds of such unstructured collocation grids, for which the corresponding 
theoretical analysis has been addressed recently and is under active development. 


3.1. Monte Carlo sampling. Monte Carlo (MC) sampling is a natural choice for least- 
squares regression. For example, one generates independent and identically-distributed (iid) 
samples from a random variable with density p (recall that Z is a random variable with 
density p), and these samples form the nodal array {zm}- This choice of sampling is certainly 
justihable: It is straightforward to establish that the discrete formulation (0 converges to 
the Lp-optimal continuous formulation as M —)• oo: 

M 

lim argmin {p{zm) — u{zm))‘^ = argmin E (p(Z) — u{Z))^ 

M^oo pgp(A) p^p^N) 

It is thus not surprising that this least-squares formulation with random samples is popular 
Ha ESI Eg E]. In practical computations, the number of design samples M drawn from the 
input distribution scales linearly with the dimension of the approximating polynomial space 
N] taking M ~ cN with c between 2 and 3 is a common choice. Thus we desire stability and 
convergence results under the assumption that the sample set size M scales linearly with the 
approximation space dimension N. Such results are not yet definitively available, but below 
we discuss what has been accomplished in this direction. 
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Perhaps the simplest example is MC with Z a uniformly-distributed random variable on a 
compact interval: Consider p the uniform measure on [—1,1], The analysis in [30] shows that 
if one generates M ~ N'^ iid MC design samples drawn from p, then the spectral properties of 
the least-squares design matrix are controlled, implying stability for recovery with regression. 

Theorem 1. 


[30] Let p he the uniform measure on [—1,1], and A = A;[ n-i- r > 0, 

assume that > CrN"^ for a universal constant C. Then 





1 

Pr 


A-I 

> - 



“ 2 


< 2M- 


This stability result can also be used to prove near-best approximation properties. These 
results are extended in m to multidimensional polynomial spaces with arbitrary lower index 
set^ I.e., let p be the the uniform measure on [—1; 1]'^, and assume quadratic dependence 
M as in the univariate case. If A in (|13|) is a lower set, then similar quadratic scaling 


M oc N'^ ensures stability and near best approximation of the method independent of the 
dimension d. 

If instead one considers sampling from the Chebyshev measure 

/ / n2\-V2 

( 14 ) • ^£[- 1 . 11 " 

i=i ^ ^ 

with the associated Chebyshev polynomial basis (fa^ then the quadratic dependence can be 
weakened to M ~ A^sr 2 , where ~ 1.58. Such a technique was applied to a class of elliptic 
PDE models with stochastic coefficients, and an exponential convergence rate in expectation 
was established |27|. 

The analysis for unbounded state spaces D is less straightforward; for these unbounded 
domains, some of the tools used to establish the results above cannot directly be applied. 
Nevertheless, for the univariate exponential density function p{z) = exp(—z^), the authors in 
m use a mapping technique in conjunction with weighted polynomials to establish stability. 
With this choice of p, the orthogonal family of Hermite polynomials 4>n{z) = Hn{z) would 
be the gPC basis choice. Consider approximation not with polynomials, but instead with 
the Hermite functions: 

ipniz) = expi-z"^/2)f)n{z) = exp(-z^/2)if„(z). 

The collocation samples Zm are not generated with respect to the density p. Instead, one first 
generates samples of a uniform random variable on the interval [—1,1] and subsequently 
maps these to the real line via the transformation 




(- 1 , 1 ), 


where the scalar L is a free parameter. Thus, the least-squares design matrix A from (12) is 
now given by 

1 ^ 

(15) A = (M(V’i,V’i)M)ij=l,...,7V> = J^'^u{z{im))v{z{im)) ■ 


m=l 


The authors in m prove stability of the formulation ( |15[ ), requiring only linear scaling of M 
with respect to N (modulo logarithmic factors). 


set A is a lower set in this paper if for any a £ A, all indices below it also lie in A: I.e., 
{/3 £ INq I d < n} C A holds for any a £ A. Here, the ordering < is the partial ordering on Njj 
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M = 359, M = 179 M = 751, M = 375 




Figure 2. Weil grids in two dimensions (d = 2). Left: prime number seed 
A4 = 359 resulting in M = 179 points. Right: prime number seed A4 = 751 
resulting in M = 375 points. 


Theorem 2 f|78)l. For any r > 0, assume o-nd L > \/N. Then the least-squares 

design matrix from (|15[) is stable in the sense 





5 

Pr 


A-I 

> - 
“ 8 




The constant C is universal. 


Because the hermite functions -0^ are weighted versions of the Hermite polynomials 
Hm, one can consider the above a statement about stability of a weighted least-squares 
approximation problem. 

In this subsection, Monte Carlo grids are themselves randomly generated, so most state¬ 
ments about stability and accuracy of these least-squares formulations are probabilistic in 
nature, e.g., convergence with high probability or convergence of the solution expectation. 
In the next subsection we consider one type of deterministically-generated mesh. 


3.2. The Weil points. In certain applications, a judicious, deterministic choice of samples 
may provide several advantages over randomly-generated samples. In [99], the authors present 
a novel constructive analysis for the deterministically-generated Weil points. Suppose that 
is a prime number; |99| proposes the following sample set: 

(16) Wm ■= [zj+i = cos(2/j+i) : Vj = 2 tt (j,f,.. ., /M, j = 0,..., [Al/2 - Ij } , 

where [A^/2j gives the integer part of A^/2. Note that the number of the points in the 
above grid is M = [A^/2j. In fact, with Zj as in ( [l6| , one can show that the points 
coincide with the set of points {zj}j^, and so this latter half of the set is discarded. 
Examples of two-dimensional Weil grids are shown in Figure]^ with = (359,179) 

and {M,M) = (751,375). 
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The above collocation grid Wm, designed originally for approximation when p is the Cheby- 
shev measure, is motivated by the following formula of Andre Weil (hence the eponymous 
title “Weil points"): 


Theorem 3 (Weil’s formula (85j). Let M be a prime number. Suppose f{w) = miw + 
m 2 w‘^ + • • • + mdW^ and there is a j, 1 < j < d, such that Ai f mj, then 


(17) 


M-l 

E 


'2-n-ifU) 

e 


j=0 


< {d-l)VM. 


This formula plays a central role in deriving several properties about the Weil points, 
including least-squares stability results. Using Weil’s formula, [99] shows that the Weil points 
distribute asymptotically according to the Chebyshev measure: 


Theorem 4 (|99jl. Let Wm^ — {^i,K, ■ ■ ■ AMk,k} deterministic sampling set (16) 

generated by the K ’th prime number. We define the empirical measure of the Wm^ ■ 


VK ■■ = 


1 

Wk 


Mk 

i=i 


where 6z is the Dirac measure centered at z, and let Vc be the Chebyshev measure with density 
dufiz) = Pciz) from (14). Then uk —t Vc in the weak-* toplogy as K 


oo. 


Having established what kind of measure the Weil points (16) sample according to, we 


turn to stability. Assume that p is the Chebyshev density; namely, we use the tensorized 
Chebyshev polynomials as the gPC polynomial basis. By utilizing the Weil points in the 
least-squares framework, one can obtain estimates for the components of the design matrix 
A by using Weil’s formula Theorem]^ These estimates, in conjunction with Gerschgorin’s 
Theorem, result in the following stability result: 


Theorem 5 (j99)). Suppose that I is the size-N identity matrix, A is the design matrix 
associated with the Chebyshev gPC basis, and the Weil points are generated with the prime 
number seed A4, yielding M points. If M > C{d) ■ then the following stability result 
holds 




M 


1 

“ 2 ’ 


where 


is the spectral norm. 


Therefore the corresponding least-squares problem admits a unique solution, provided that 
M > C{d) ■ N"^. Stability results for least-squares problems are one ingredient for deriving 
convergence results. For example, the above stability result yields the following bound on 
least-squares error: 

Theorem 6 1 j99)l. Let f £ L^ be a multivariate function, and let p'^ be the L°°-best 
approximating polynomial from P{A), and let P^f be the least-squares P{A)-solution with 
the Weil points. If the prime number seed satisfies Ai > C{d) ■ N'^, then 

\\f-PMf\\Ll<C\\f-p*M\L-. 

The above theorem implies the error for the least-squares solution is comparable to 
the optimal approximation. However, we have assumed the quadratic dependence of 
the number of design points on the degree of freedom of the approximation space. This is 
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stronger than the dependence ~ in 2 for the Monte Carlo sampling from the Chebyshev 


measure as discussed in Section 3.1 Nevertheless the construction here, being deterministic, 
does not suffer from probabilistic qualihers on convergence results. (E.g., convergence “with 
high probability".) Moreover, we shall show with numerical examples that least-squares 
regression with Weil points performs comparably to Monte Carlo sampling. 

For general (non-Chebyshev) types of orthogonal polynomial approximations on compact 
domains, |99] also proposed the following weighted least squares approach 


M 


(18) 


UN,w = Pm w'^ = argmin ^ Wm {u{zm) - p{zm)f , 
peP(A) 


with positive weights Wm dehned by 

Pci^m) 


'Wm. — 


= Tr'^piZm) n ( 1 


9=1 


- 


1/2 


where z^ is the ( 7 th component of the grid point Zm- The basis set (j)^ is the /o-gPC basis. 


orthonormal under the weight p, and p^ is the Chebyshev density (14). The choice of the 


weights Wm above ensures that the induced change of measure makes the Weil-sampled 
zcm-weighted norm equivalent to the sought p-weighted norm. 

An example of this will be illustrative: suppose we let p be the uniform (probability) 
density p{z) = 2“'^ on [—1,1]*^. Then we have 

d 


(19) 


Wrn. — 


Pci^m) 


(vr/2)"n(l 

9=1 


- 2 :, 


.(9) 


1/2 


Since Wm is applied to the quadratic form (18), we are effectively preconditioning f{zm) 
with y/Wm- Thus, if the <j)j are tensor-product Legendre polynomials (orthonormal under the 
uniform density), then we are preconditioning our expansion as 


N N N / d 

j=l j = l j=l \g=l 


1/4 


(/>j(z) 


This type of preconditioning is known to produce well-conditioned design matrices in the 
context of minimization for Legendre approximations |72| . Of course if p oc then 


we obtain constant weights. Therefore, the weights proposal (19) reduces to well-known 
preconditioning techniques for some special cases. 


3.3. Structured random points. Although a structured grid, such as the tensor-product 
grid, will frequently be impractical in computations for d A> 1 , the grid itself may be used 
as a candidate set on which to extract a subset of points that is useful for approximation. 
One way to do this is to randomly sample a subset grid from a high-cardinality structured 
candidate set, thus producing a subset grid that is essentially unstructrued. 

For example, assume that we try to hnd an approximation in the hyperbolic cross space 
with N = hf. We assume that the functions 4’q{x) that span are tensor-product 
Chebyshev polynomials. Such an expansion can also be viewed as an expansion in the tensor 
product space P^, i.e. 

(20) u{z)= ^ c„(^„( 2 ;), c„ = 0 if a G A^fc\A^fc. 
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M = 2.5N 



Polynomial order k 


M = 1.5N\ogN 



Polynomial order k 


M = {).‘6N {logNf 



Polynomial order k 


Figure 3. Condition number with respect to the polynomial degree k in the 
4-dimensional hyperbolic spaces. Left: M = ^N. Middle: M = |A^logA^. 
Right: M = O.^Nlog^ N 


Now let {ym}m=i be the tensorized grid of the one dimensional Chebyshev Gauss quadrature 
points, where pf = {k + 1)'^. With such a tensor grid {ym}-, one can exactly recover any 
polynomial in by the generalized discrete Fourier transform. The normal equations design 
matrix is the identity matrix in this case. Because hf <C tf when d is large, we have the 
freedom to choose a subset of points with cardinality M from the full tensor grid, with M 
satisfying 

( 21 ) hi<M<pi 


We will select these points Zm randomly with the equal probability law on the candidate set 
{yra}- This idea is proposed in [98] . and it is shown that, using the Chebyshev density p and 
associated gPC basis, the design matrix is stable with probability at least 1 — for 

any p >2, provided that 


( 22 ) 


M 

log M 


> CpN. 


The framework in [98] applies to approximations with densities p more general than Chebyshev 
measures. In particular, it includes measures on bounded domains such as the uniform 
measure, and on unbounded measures such as the Gaussian measure. 


3.4. Numerical tests. We now provide some numerical examples to illustrate the stability 
and the convergence properties of the least-squares approach with the design points described 
above. Many more related tests are available in [611IM] 198] . We first investigate how the 

number of collocation points affects the condition number, cond(A) = where Umax 

^min 


and (Jmin are the maximum and minimum singular values, respectively. These results are 
shown in Figure]^ for the 4-dimensional hyperbolic space The orthogonal polynomial 
measure p is the Chebyshev measure and the basis elements are tensor-product Chebyshev 
polynomials. We test the three design sampling methods described in the previous sections: 
“MC" corresponds to the Monte Carlo design of Section 3.1 “Weil points" corresponds to the 
Weil points design of Section [3.2] and “Gaussian" refers to random subset sampling from a 


tensor-product Gauss quadrature grid from Section 3.3 


Because the design matrices with the structured points and the random points are random 
matrices, we repeat each of these tests 200 times and report the mean condition number. 
We also plotted the error bars for the two kinds of random samples corresponding to one 
standard deviation above the mean. The left plot of Figure]^ shows results for linear scaling 
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Legendre approximation, M = 2N 


Chebyshev approximation, M = 2N 




Figure 4. Approximation error in the two dimensional total degree space. 
Left: Chebyshev approximation. Right: Legendre approximation 


of M with respect to N, i.e. M ~ |A^, and the middle plot shows log-linear dependence 
M ~ |A^logA^. The right plot shows and even stronger dependence, M Q.miog^N, We 
note that the performance of all three methods is similar, in the sense that the log-linear 
dependence (middle and right plots) admits stable condition numbers with respect to the 
polynomial order k. In contrast, the linear rule (left plot) admits modest growth of the 
condition number with respect to the polynomial order k. We also observe that the Weil 
points perform slightly worse compared to the other two types; however, we emphasize that 
the Weil sampling method and the corresponding analysis is deterministic. Not shown is 
quadratic dependence, M ~ because the Figure]^ shows that log-linear dependence is 
enough to guarantee stability. Thus, there seems to be room for improving the quadratic 
dependence estimate in |99) . 

Finally, we test the convergence rate of the least-squares approaches in d = 2 dimensions. 
The target function is smooth, chosen as u{z) = exp“^i=i“*^% where the parameters {a*} 
are generated randomly. We measure the error in the L°° norm, computed on a set of 
2000 points that are independent samples (i.e., different from the design points) from a 
uniform distribution on [—1,1]^. The errors with respect to the polynomial order k from the 
two-dimensional total degree space Tj^ are shown in Figure In the left-hand pane, the 
underlying measure is the Chebyshev measure and the basis functions (/>„ are Chebyshev 
polynomials. In the right-hand plot, the underlying measure is the uniform measure and 
we use the Legendre polynomials as basis elements. In this framework, we will use the 
preconditioned version of least-squares (19) with Chebyshev-like design points (i.e., random 
Chebyshev points, the Weil points, and points randomly selected from Chebyshev Gauss 
points). In all the plots, we have used the linear rule M = 2N which is dependence that is 
more feasible in practical computations. 

The results given in Figure illustrate that the linear rule display the exponential 
convergence with respect to k, for both the least-squares and its preconditioned version. The 
convergence stagnates at machine precision, which is expected. Such a result also points out 
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a gap between the M/N dependence necessary to achieve optimality in current theory, and 
the condition that in practice yields an optimal convergence rate. 

Remark 1. We note that Quasi Mento Carlo points are deterministically-generated and 
can also be used in the least-squares framework, and one can find such investigations in 
|28l H2] . We also remark that the random parameters here are assumed to be supported in 
[—1,1]'^. One may of course encounter problems with unbounded parameters, e.g., problems 
with Gaussian/Gamma parameters. Recent work in [781 investigates such situations. 

4. Compressive sampling 

Compressive sampling (CS), or compressive sensing, considers recovery the of a sparse 
representations from limited data, and in applications this is usually considered when there is 
insufficient information about the target function. In the framework of this paper, this occurs 
when the number of samples M is less than the cardinality of the polynomial space for the 
approximation N. CS is an emerging and maturing area of research in signal processing that 
aims at recovering sparse signals accurately from a small number of their random projections 
(see e.g. 121 EH US IMl [29] and references therein). A sparse signal is simply a signal that 
has only few significant coefficients when linearly expanded into a non-adapted basis, in our 
case the {4>a}- Thus, when the sample count M is smaller than the approximation dimension 
N then one can use to CS approaches to construct a polynomial approximation, under the 
assumption that the underlying target function is sparse in (/>„. The typical approach in 
CS is to minimize a norm of the polynomial, with the constraint of matching the data. For 
sparsity, one seeks to minimize the Iq norm ||c||g of the coefficient vector, and under certain 
conditions the minimizing coefficient vector is identical to that which minimizes the £i norm 
||c||^. Solving the latter minimization problem is preferred in practice because such problems 
are convex optimization problems. 

The success of the CS lies in the assumption that in practice many target functions are 
sparse: what appear to be a signal with many features may contain only a small number of 
notable terms when transformed to the frequency domain. This is indeed the case in many 
UQ problems. For instance, solutions to linear elliptic PDFs with high-dimensional random 
coefficients admit sparse representations with respect to the gPC basis under some mild 
conditions 

For stochastic collocation methods in the CS framework, we are interested in the case 
that the number M of solution samples is much smaller than the number N of unknown 
coefficients. One then seeks to a solution c with the minimum number of non-zero terms. 
This can be formulated as the optimization problem 

(23) Pq : argmin ||c||o subject to Ac = u, 

where ||c||o : #{« : Ca 0} is the norm on vectors and should be interpreted as the 
number of non-zero components of c. In general, the global minimum solution of Pq is not 
unique and is NP-hard to compute. Fortunately, under restricted isometry conditions on the 
design matrix, the computed minimizer approximates the Iq minimizer very well, even 
with noisy measurements |25[ 126] . The minimization problem in our context is 

(24) Pi : argmin||c||i subject to Ac = u, 

(Above, II • 111 is the norm on vectors.) The advantage of the Pi formulation is that it is a 
convex problem, and so computational solvers for convex problems may be leveraged |971195j . 

In practice, since the approximation basis is truncated to a finite number of functions, the 
approximation u may not be exact. Further, the measurement vector u may be corrupted 
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by noise. These factors lead to the relaxation of the equality in (24) and reformulate the 
problem under the Basis Pursuit Denoising form: 


(25a) 


Po,£ : argmin||c||o subject to ||Ac = u ||2 < £, 


(25b) 


Pi^e : argmin||c||i subject to ||Ac = u ||2 < e, 


In what follows, we will focus on the traditional £i-minimization form |2^ Consider the 
error in the best s-term approximation of a coefficient vector c G 


(26) as,p(c) = inf ||y-c||p 

lly||o<s 

Clearly, as,p{c) = 0 if c is s-sparse, i.e., ||c||o < s. Let A be an M x A matrix. Dehne the 
restricted isometry constant (RIC) ds < 1 to be the smallest positive number such that the 
inequality 

(27) (1 — 5 s)||c ||2 < ||Ac||| < (1 + 5 s)||c ||2 

holds for all c G satisfying ||c||q < s. If the above holds, the matrix A is said to satisfy 
the s-restricted isometry property (RIP) with restricted isometry constant 5s- Now, we are 
ready to state the following sparse recovery for RIP matrices . 


Theorem 7 m)- Let A G he a matrix with RIC such that Sg < 0.307. For a given 

c G let he the solution of the ii-minimization 


(28) 


argmin||c||i subject to Ac = Ac. 


Then the reconstruction error satisfies 
(29) llc’^ 


c||2 < c 




for some constant C > 0 that depends only on 5s- In particular, if c is s-sparse then 
reconstruction is exact, i-C-, c^ = c. 


The theorem above indicates that, as with regression, the choice of nodal array is of great 
importance: A ‘good’ nodal array will lead to a design matrix A with an acceptable RIC so 
that Theorem may be invoked for convergence. 


4.1. Monte Carlo sampling. As in the least-squares framework, the MC sampling method 
is still promising in the CS framework. Indeed, much of the pioneering CS work employed 
MC sampling |24[ HEl [531 ESj- However, using such an idea for UQ applications is relatively 
new. Some of the hrst work in this area was investigated in |58l 00] , where the authors 
applied CS ideas to stochastic collocation and obtained some key properties, such as the 
probability under which the sparse random response function can be recovered. 

The authors in [za UM investigated the recovery of expansions that are sparse in a 
univariate Legendre polynomial basis. Their analysis relies strongly on RIP results from 
bounded orthonormal polynomial systems, which we summarize in the following: 

Theorem 8. m Let {fin} be a hounded orthonormal system, namely, 

(30) sup ||(()„||oo = supsup |(/)n( 2 ;)| < L 

n n z 

for some constant L > 1- Let A G be the interpolation matrix with entries {an,m = 

(/>n-i(2m)}^<^<jY ^ M X M diagonal matrix with entries Wm,m = 
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(7r/2)^/^(l — where the points {zm}i<m<M o,re iid samples drawn from the one- 

dime 

(31) 


dimensional Chebyshev measure (14). Assume that there is a S > 0 such that 

M > C6~‘^L^slog^{s)\og{N). 


Then with probability at least 1 — N the RIC 6s of -^WA satisfies 6s < <5. Here 

the C, 7 > 0 are universal constants. 

The authors in m use the above use to provide the following recoverability result for one 
dimensional sparse Legendre polynomial expansions: 


Theorem 9 (|72]b Let Zm be iid samples from the Chebyshev measure, and let A G 
be the corresponding Legendre polynomial (jin design matrix with entries 

{am,n = 

and let W be a diagonal matrix with entries Wm,m = (^/2)~^'^^(1 — Assume that 

(32) M > Cslog^(s)log(A). 

Let c G be any coefficient vector, and consider the solution to the following 
optimization problem: 

(33) = argmin ||c||i subject to WAc = WAc. 

Then with high probability the solution is within a factor of the best s-term approximation 
error. I.e., 


Pr 


Above, C and 7 are universal constants. 


— c 

^ CcFsp (c)' 


2 “ \/s J 


> 1 _ 


Note that in the above theorem, the random samples are drawn from the Chebyshev 
measure, namely, the CS method above is a Legendre-preconditioned £i-minimization with 
Chebyshev samples. The preconditioned/weighted Legendre polynomials have a uniform 
bound [77] when the weight is {1 — and this is the main reason to define the weight 

matrix W to obtain Theorem [8] or Theorem 4.3. 

The above result is extended in |94| to high-dimensional problems, for both the original 
^i-minimization and the preconditioned £i-minimization. We summarize these results with 
the following theorem: 

Theorem 10. |94| Let {(l)n}n=o Legendre polynomial bases of the total degree space 

T^, and let u(z) = J2n=o ^n4>n be an arbitrary polynomial with coefficient vector c. For 
some nodal array {zm}i<rn<M’ ^ ^ corresponding M x N design matrix and 

M X M diagonal preconditioner/weight matrix, respectively, with entries 


— 0n—1 








(1) Let {zm}i<m<M be i.i.d. random samples drawn from the uniform measure, and 
assume that 


M > 3^slog^(s) log(A). 
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Then with high probability, the solution to the direct minimization problem (28) 
is within a factor of the best s-term error: 


Pr 


c* -c 

^ Cas{c)i' 


2 “ Vs \ 


> 1 _ j\r-7iog®(s) 


(2) Let {zm}i<m<M be i.i.d. random samples drawn from the Chebyshev measure, and 
assume that 

M > 2'^s log^(s) log(iV). 

Then with high probability, the solution to the preconditioned/weighted mini¬ 
mization problem (33) is within a factor of the best s-term error: 

Pr \ c* -c < > 1 _ N-i^og^s) 

2 ^ 

For both of the above cases, the constants C and 7 are universal. 


The above results implies that in very high dimensional spaces {d > k), the Chebyshev 
preconditioned ^^-minimization may be less efficient than the direct -^^-minimization, because 
it may require more sample points when 2'^ > 3^. Of course when d = 1 only the k = 0 trivial 
case satishes this inequality, so in one dimension the preconditioned case is more effective 


We remark that some modihed CS methods have also been investigated in the random 
sampling framework, one can refer to |67[ |9^ for the weighted (re-weighted) approaches. 


4.2. Deterministic sampling. Although random sampling methods have been widely 
used in the CS framework, a judicious, deterministic choice of points may provide several 
advantages over randomly-generated points. Deterministic CS sampling and recoverability is 
a well-studied field for recovery of sparse trigonometric polynomials HU mi EH ESI [16]. 

However, such a framework has not been widely investigated for UQ applications, especially 
for recoverability of general types of sparse polynomial expansions. In [93], the authors use 
the Weil points (16) to recover sparse Chebyshev polynomials. They use the Weil exponential 
sum formula. Theorem to control the incoherence parameter of the design matrix, which 
in turn can be used to ascertain RIC information. More precisely, we have 


Theorem 11 ([93]). Let 


u = 




be an arbitrary Chebyshev polynomial expansion in P{A) with coefficient vector c. Suppose 
that A4 > C(A)s^ is a prime number, and assume that is given by the ii-minimization 
problem ( [2^ with the design matrix A being the evaluations of the Chebyshev bases on the 
Weil points generated by prime number seed A4. Then 


(34) 



< 


0-5,1 (c) 


Thus, minimization can recover s-sparse Chebyshev polynomials, provided that the 
number of the Weil samples scales quadratically with the sparsity s, i.e., M C{A)s^. The 
results apply to any high dimensional polynomial spaces with downward closed multi-index 
sets, such as the T^, P^ and H^. However, different spaces result in different constants C'(A), 
and the estimates for C'(A) in |93| are not optimal. Although the Weil estimates obtained 
are not optimal in the sense that they require that the number of sample points M scale 
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Chebyshev approximation, AI = 161, k = 30 



Legendre approximation, M = 161, fc = 35 



Figure 5. Recovery probability with respect to sparsity s in the two- 
dimensional total degree space T^. Left: recovery of sparse Chebyshev 
polynomial representations. Right: recovery of sparse Legendre polynomial 
represent at ions. 


quadratically on the sparsity s. Nevertheless, we will show via numerical tests that the Weil 
points have a similar recoverability properties when compared with MC samples. 

The results indicating that the Weil points perform comparably to MC sampling (see 
Section 4.3) are not necessarily surprising: We have shown in Theorem that the Weil points 
distribute asymptotically according to the Chebyshev measure. Thus, it might be expected 
that the Weil points produce similar performance as MC Chebyshev samples. Moreover, [93j 
also proposed the preconditioned ^^-minimization to handle general sparse polynomials with 
the Weil points. However, such a framework has the similar drawback as we discussed in the 
last section: In very high dimensional spaces {d > k), the preconditioned ^^-minimization 
may be less efficient than direct ^^-minimization. 


Remark 2. Similar to Section \3.!^ one may use what we refer to as structured random 
points (samples randomly chosen from a tensor-product Gauss quadrature grid, or some other 
candidate set) to recover sparse polynomials. The idea is similar to Section 3.3, so we will 


not discuss it in detail; However, we will test the numerical performance of this method in 
the next section. 


4.3. Examples. This section paralles Section pL4| We will compare the performance of MC 
samples, Weil points, and random subsampling from a structured tensor-product Gauss 
quadrature grid. We are interested in the recovery performance with different kinds of points 
via preconditioned/weighted li minimization. We assume the target (exact) function has 
a polynomial form, i.e. u{z) = with ||c||q = s, and attempt to recover this 

vector. For a given sparsity level s, we hx s coefficients of the polynomial while keeping 
the rest of the coefficients zero. The values of the s non-zero coefficients are drawn as iid 
samples from a standard normal distribution. We repeat the experiment 100 times for each 
fixed sparsity s and calculate the success rate on these 100 runs. (’’Success" here means that 
||c — < 10“^.) 

As in Section 13.41 we use the terms “MC", “Weil", and “Gaussian" to refer to Monte 


Carlo sampling (sections 3.1 and 4.1), Weil points sampling (section 


sampling from a tensor-product Gauss quadrature grid (section 3.3), respectively. 


3.2 and 4.2), and subset 
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To solve the minimization, we employ the available tool SPGLl from |83j that was 
implemented in Matlab. We will conduct two groups of tests, the first using a Chebyshev 
polynomial expansion, and the second using a Legendre polynomial expansion using the 
preconditioning strategy of Theorems E and [T^ 

The hrst test is a low dimensional test, with the index set A corresponding to the two 
dimensional total degree space T|. In the left-hand plot of Figure we show the recovery 
rate for sparse Chebyshev polynomials (with k = 20, d = 2, and M = 74) as a function of 
sparsity level s. We see that the three kinds of design points have very similar performance. 
In the right-hand plot, we show the recovery results for sparse Legendre polynomials (with 
k = 35, d = 2 and M = 66) using the preconditioned formulation (33). We have also tested 
direct (“unpreconditioned") recovery with MC samples drawn from the uniform measure. 
For this low dimensional test, the preconditioned results are slightly better compared to 
direct ^i-minimization. 


Chebyshev approximation, M = 161, k = 4 Legendre approximation, M = 197, k = 3 




Figure 6. Recovery probability with respect to sparsity s in the 15- 
dimensional total degree space. Left: recovery of sparse Chebyshev polynomial 
representations. Right: recovery of sparse Legendre polynomial representa¬ 
tions. 

The second example we consider is the 15-dimensional total degree space In Figure 
the left-hand plot shows the recovery results for sparse Chebyshev polynomials (with 
k = A, d = 15), with respect to the sparsity level s and a hxed number of design points 
(M = 81). Again, we see that the three kinds of design points have very similar performance. 
The right-hand plot provides recovery results for the sparse Legendre polynomials (with 
k = A, d = 15, and M = 97). For this higher dimensional test, the direct ^i-minimization 
with MC uniform points performs better than the preconditioned versions (with Weil points, 
the Chebyshev MC points, and the Gaussian points). This is a direct manifestation of the 
analysis provided by Theorem |10| Finally, we remark that within the preconditioned tests 
shown in Figure the Gaussian points have a noticeably performance than the other two. 

5. Interpolation 

The last type of approximation we consider is interpolation: given a general array Zm = 
{zi ,..., Zm} of unique nodes and data Um = u{zm), we wish to construct a polynomial p 
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satisfying 

(35) p{z) = '^Cafpaiz), p{Zm)=Um, m = 

aSA 

for some index set A with |A| = M = In the context of the model problem Q, we have 
the N X N system with A as defined in previous sections: 


(36) 


Ac = u. 


(^)n,a ~ 4’aiZn) 


Usually we want p to be as “simple" as possible, and this translates into the prescription 
of A, e.g., requiring that p have smallest total degree. One of the challenges for a naive 
approach to interpolation is that one may not have existence and uniqueness: the classical 
Mairhuber-Curtis theorem |56[ l32] ensures that no matter which A is chosen, there will exist 
a set of points on which interpolation is not unisolvent for P{A). 

The above is in contrast to the univariate interpolation problem where, with N nodes, degp 
is known to be exactly N — 1 regardless of (distinct) nodal choice, and the resulting polynomial 
is unique given data. In the multivariate case, the space from which the interpolant is drawn 
cannot be a priori specified if we allow the grid Zm to be freely defined; the approximation 
space must be idenified once the nodes Zn and/or the data Un is provided. 


In Section 5.1 we present one way that uniquely chooses a smallest-degree interpolatory 
polynomial given data locations Zm, which depends on the gPC weight function p that 
determines the orthogonal basis (j). The construction is independent of the data u. This 
polynomial is the Least Orthogonal Interpolant and allows us to make smallest-degree 
polynomial interpolation well-defined in multiple dimensions. This is followed in section |5.2| 


by a discussion of Lebesgue constants for accuracy and stability. Finally, sections 5.3 and 
|5.4| describes conditions under which an unstructured nodal array may have well-behaved 
Lebesgue constant. 


5.1. Least Orthogonal Interpolation. Our task in this section is to define and present an 
operator that prescribes p-dependent, smallest-degree, unisolvent polynomial interpolation in 
the multivariate setting. As previously discussed, this is not the only choice of interpolation 
operator, but it has the advantages of being (1) p-dependent, (2) automatically constructible 
simply with evaluations of the basis </>, and (3) is compuated via an extension of standard 
linear algebraic matrix factorizations. 

The Least Orthogonal Interpolant (LOI) is a generalization of the ‘least interpolation’ 
construction [MllaslES]. The LOI uses information about the nodal array Zm along with 
the density p to construct this interpolant |64) . If p is the density function corresponding to 
a multivariate standard normal random variable, then the LOI coincides with the traditional 
least interpolant. 

Given a multivariate nodal distribution Zm, define expansions of Dirac distributions 
centered at the Zm- 

OO 

(37) <5^„(-) = Yj 

\ a \=0 

Since is a representation of the reproducing kernel, one can directly verify that the formal 
sum on the right-hand side is the L^-representor for point-evaluation: u{zm) = {u, ^zm)p 
any continuous u in For any polynomial p, one can define the degree-A: projection 
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operators: 

'PkV= X] iP^^a)p4>a 

\a\<k 


Note that these are effectively the action of a series truncation of 5z^ on p. Finally, we 
introduce the ‘least-p’ operation: 


Pi,p = PkP, 

So is effectively the first nonzero 


k = min {A: G N | / 0} 


Taylor series" contribution from an expansion of the 

likewise 

depends on p. Given a collection of M nodes Z = {zi,..., zm}, the polynomial images of 
6z^ under (•)4.,p form a polynomial space: 


form (37). Because the projection operators depend on p, the polynomial p 


Uz = span I g G span {(5^,,..., <5^^ J} . 


The polynomial space Hz is the least orthogonal polynomial space for interpolation. 


Theorem 12 ([M|)- The space Hz has dimension M, and for any continuous u there is a 
unique p G such that Um = p{zm) for m = 1, ..., M. In particular, there exist Lagrange 
functions irn{z) G 11^ such that the interpolant p G II^ can expressed as 

M 

(38) P{z') — ^ ^ Uznimiz'), ^m(Zn) — ^m,n; 

m=l 

with 6m,n the Kronecker delta. 

The interpolant p is the Least Orthongoal interpolant. It depends on the weight function 
p. When p is a the Gaussian density function for a standard normal random variable, Hz 
coincides with the traditional ’least interpolant’ polynomial space of |34| . While the definition 
of the space 11^ is fairly abstract, the construction is accomplished in familiar ways: assuming 
we know k = deg 11^, let the index set A = . Form the rectangular design matrix A from 

([^. The space FI^ can be computed with a combination of LU and QR operations on this 
matrix, and the operation count is asymptotically similar to standard interpolatory matrix 
factorization schemes. Using these operations one obtains the following factorization for the 
input rectangular matrix A 


(39) 


PA = LUH, 


For a size-M interpolation problem, the matrices L and U are standard M x M lower- and 
upper-triangular matrices corresponding to an LU factorization, and P is the associated 
M X M row permutation matrix. The rectangular matrix H contains entries that identify 
the space II^. 


Once data u is given, the coefficients c from (35) that define the interpolant are computed 


as 


c = H^U"^L“^Pu 


The new matrix H is a rectangular matrix with orthonormal rows that can be obtained from 
H in a trivial manner. In order to compute c, the operation count is asymptotically identical 
to any standard LU factorization for square matrix inversion. We refer to |64) for the details. 
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5.2. Accuracy and the Lebesgue constant. The existence of Lagrange interpolants from 
Theorem allow us to extend univariate notions of stability and accuracy to multivariate 
settings. With a grid Z and the associated Lagrange polynomials 1^ dehned in (38), we can 
dehne a 2^-dependent, p-weighted Lebesgue constant: 


M 


(40) 


kp[Z) = maxp(z) ^ 


m=l 


pi^Zyyi) 


In the unweighted case p = 1, this reverts to the more familiar formula for the Lebesgue 
constant. The introduction of the weight p in the above expression is the proper way to 
generalize the Lebesgue constant to the weighted interpolation problem |54| . Note that one 
need not use the LOI space Hz in the above dehnition of Ap. 

On compact domains, the presence of a weight p bounded away from 0 and oo does not 


significantly affect the interpolation problem in the context of (40), as in such a case one 


can declare the weighted norm equivalent to the unweighted one with appropriate bounding 
constants. However, on unbounded domains it becomes a necessary consideration. In 
particular we will consider exponential weights of the form p{z) = exp(—for v >1 on 
D = IR'^. 

The error in an interpolation procedure can be understood in terms of the Lebesgue 
constant, which is the operator norm for weighted interpolation. Let Xjv be any interpolation 
operator; the LOI construction from the previous section is one such choice. We assume that 
I]\f maps continous functions to some polynomial space Pn, i.e., Ijsf : Cp{D) —)• P]\f. The 
classical Lebesgue Lemma ties the operator norm Ap of Xjv to the interpolant error: 


(41) 


sup p{z) 

zGD 


\u{z) - Inu{z)\ < {1 + Ap) Ep^n{u). 


p^n{u) is the best (smallest) possible error in the p-weighted supremum norm 


The term 

when approximating u by any element in P/y. The Lebesgue Lemma is one way of “separating" 
the total interpolation error into two parts: first the inherent error Pp^A? depending on the 
data u and the choice of approximation space, and second the instability Ap due to the 
interpolation procedure. Thus, we are ultimately interested in constructing nodal arrays Z 
with small Lebesgue constant. In the multivariate setting, this is still an open problem. 

Values for acceptable Lebesgue constants in one dimension are well-understood: on 
bounded intervals Zl C IR it is well-known that if a nodal set Z is uniformly distributed on 
the interval, then A grows exponentially as the cardinality of A is increased (see |82| and 
references therein), and thus the interpolation problem is unstable. On the other hand, for 
certain special conhgurations of A that cluster nodes towards the boundary like the arcsine 
density pc = (1 — the Lebesgue constant exhibits logarithmic growth in the mesh 

cardinality, yielding a stable interpolation scheme |181146] . Results on the one-dimensional 
unbounded real line are likewise well-established |59[ 1601 176] . 

In our multivariate setting, we will generally be concerned with whether or not an array 
of nodes exhibits exponentially-growing Lebesgue constant. In fact, since Ep^i^{u) usually 
decreases exponentially with N for u an analytic function, then subexponential growth of Ap 


can be used in Lebesgue’s Lemma (41) to conclude convergence of the interpolation scheme. 


While concrete results about Ap are available for the univariate case, these results are 
currently lacking for the multivariate case. What is known are necessary conditions that 
a grid should satisfy in order to have a “good" Lebesgue constant: The following sections 
introduce these conditions, which stem from pluripotential theory. All these results do not 
directly appeal to the geometry of the grid, and so they apply to any unstructured grids. 
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5.3. Unweighted polynomial approximation. There is a limited understanding of how 
one constructs robust interpolation grids in high dimensions. For canonical domains, many 
approaches yield effective sampling meshes (e.g., the ‘Padua points’ in two dimensions 
| 22 [ I13|i. Tensor-pro duct domains yield good results when the dimension is small {d < 3). 
Mapping techniques can produce good nodes for curvilinear domains and blending techniques 
work well for triangular and tetrahedral domains |84) . Approaches that ht more within the 
unstructured framework are directly optimized Fekete nodes [SOI |^, approximate Fekete 
points nails], and discrete Leja sequences HD. The interested reader may consult these 
references to hnd discussions of how to construct multivariate interpolation grids; here we 
will only discuss theoretical ways of characterizing a ‘good’ interpolation grid. 

Consider the case where U is a compact domain in IR'‘* and p is the uniform weight function. 
Thus, this reduces to unweighted interpolation where the weight appearing in (40) may 
be set to unity. We are interested in characterizing limiting behaviors of nodal sets A as 
N = \Z\ — >■ oo. Let Z]\f denote a nodal set of size N . We do not require monotonicity, i.e. we 
do not enforce Zj^ C Zf^^i for any N. We recall that tf is the dimension of the total-degree 
polynomial space from ([^. The following discussion requires that we restrict our attention 
to nodal sets with total-degree cardinality N = tf. for some degree k. We are concerned with 
the asymptotic behavior of selecting a ‘good’ nodal set i.e., we are concerned with the 
behavior of stability metrics of Z]\f as N ^ oo. 

The following notation will be needed: Zj^ = • ■ •; is a set of N nodes in D. 

We dehne the following constant as the sum the polynomial degrees of any basis for Tj}: 

( 42 ) 4 = 

i=i 

In this section, we will fix d > 1, and relate the cardinality N of each grid to the polynomial 
degree k: 

(43) Nk^ti 


Given a degree k, a set of nodes Z^^ is called a set of Fekete nodes for the space if it 
maximizes the design matrix determinant over all possible conhgurations: 

Z*j^^= argmax |detA(Z)|. 

Z={yi,...,yNi^)cD 

Above, A is the (square) design matrix with the monomial basis 2;“ G on the nodes i/n- 
The behavior of the maximal determinant achieved by Z'^^ defines the transhnite diameter: 

h(U)4£m |detA(Z)Vj|'/^U 

For compact D C R'’*, this limit exists. An array of nodes {Z]^}p^ whose determinant (raised 
to the 1/sf) limits to 6 is called asymptotically Fekete. 

One final concept we will need is that of equilibrium measures. For a compact D C R'’*, 
we introduce the pluripotential equilibrium measure of the set D [TllEI]. This measure 
is the multidimensional analogue of the univariate extremal logarithmic energy measure. On 
a (tensor-product) interval, the measure pjo is the (tensor-product) arcsine measure. 

The Lebesgue constant, the concept of Fekte nodes, and the equilibrium measure are 
intimately connected. As described in |10) . the following theorem considers three ways in 
which one can characterize the behavior of the array of nodes Z^ for N = tf: 


Theorem 13 f [T^ [TUI I^L Consider the three properties an array may satisfy: 
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(1) Subexponential Lebesgue eonstant growth: lirrifc^oo = 1 

(2) Asymptotieally Fekete: limfc^oo |det A = ^{D) 

(3) Distribution aeeording to the equilibrium measure: 


^ Nk 
n=l 


hD 


Then 1 => 2 and 2 ^ 3; and the reverse implieations are false. 


The proof that 1 =► 2 is given in |12) for one dimension and extends to multiple dimensions 
as shown in |10| . That 2 => 3 is proven for one dimension in |10] and in j5] for the multivariate 
case. The explanation in |5j is relativey abstract, and an accompanying informal discussion 
can be found in |52[ [53] . 

These properties give insight into how one should sample an interpolation grid in multiple 
dimensions: In order to have a stable interpolation operator for interpolation in as 
measured by the Lebesgue constant, one must asymptotically sample according to the 
measure nr). I.e., sampling according to this measure is a necessary (but not sufficient) 
condition to obtain a well-bounded Lebesgue constant. 

In the absence of constructive ways to achieve provably well-behaved Lebesgue constants 
in high dimensions on arbitrary compact domains, one usually devises methods that produce 
asymptotically Fekete nodes, or nodes that distribute according to /U/j- While such methods 
are still under active development, promising preliminary results are given by constructing 
“approximate” Fekete points m or “discrete” Leja sequences mi A slightly different route 
may be taken by optimizing a so-called Fejer-constraint that is related to the cardinal 
Lagrange polynomials of the Least Orthogonal Interpolant |64j . We also note that the the 
d-dimensional Weil points on a hypercube domain D distribute according to this is 
statement of Theorem whether or not these points are asymptotically Fekete is unknown, 
although use of the Weil points for interpolation is limited by the restriction on their 
cardinality. 


5.4. Weighted polynomial approximation. In this section we consider weighted approx¬ 
imation on the unbounded domain D = with exponential weights. We seek to present an 
extension of Theorem 13 to the weighted case; in order to do so we will need contraction 
factors to account for the unbounded nature of D. 

On D = we consider weights of the form p{z) = exp (||2;||^) for r > 1, where ||z|| is the 
Euclidean 2-norm of z £ R'^. This family of weights includes the ubiquitous Gaussian density 
for r = 2. It is convenient and common to consider the log-weight, Q{z) = — logp = ||z||^. 
As before, we work on approximation for the space T^, define sf. as in (42), and restrict 


attention to cardinalities in ( |43[ ). Given a set of nodes Z = {zi,... zm}, then for any 
c > 0 we use the notation cZ = {czi, ..., czn}- 

The weighted Lebesgue constant Ap is as in (40). The notions of Fekete arrays and 
equilibrium measures carry over to the weighted case with some modifications: an array is a 
weighted Fekete array for if it maximizes a weighted determinant: 


Nk 

^*Nk,Q- argmax |det A (Z)| /(t/j), 

Z=(yi,...,yNjcD 
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where again the design matrix A has monomial entries: {A)n,a = As with the unweighted 
case, the limit of the maximal determinants exists [53| : 

Nk 

det A , 

m=l 

and any array of points whose weighted determinant behaves like Sp{D) is called asymptotically 
weighted Fekete. For our domain D = IR*^ we will make use of the p-weighted equilibrium 
measure pd,q from weighted pluripotential theory |7411511 fTT] . We mention that the support 
supp/i£)^Q of the equilibrium measure is a compact set (even though D = is unbounded). 

For a degree k, the weighted situation requires a contraction factor where r is the 

exponent in the log-weight Q = ||^;||^. Interpolation grids with subexponentially growing 
weighted Lebesgue constant are not asymptotically weighted Fekete arrays, but contracted 
versions of them are. 


5p{D) A hm 

k—^oo 


Theorem 14 (|5l [64]1. Consider the four properties an array may satisfy: 

1 Subexponential weighted Lebesgue constant growth: limfc_^oo 1 

2 Contracted asymptotically weighted Fekete: 

Nk 

det A {k-^/'^ZNk) n 

n=l 


lim 

k^oo 


2a Uniform eontracted boundedness: There is compact set S D supp pn^Q sueh that 
k-^/^ZNk C S for all A; € N. 

3 Distribution according to the weighted equilibrium measure: 


^ Nk 

lim — Su-i/r^ „ 
k^oo Nk^ 


= hD,Q 


n=l 


Then 1 2 and (2 -|- 2a) 3; and the reverse implications are /a/se|^ 


The weighted case on the whole space IR'^ here reveals an interesting twist: We do not 
directly want to sample with respect to Pd,q] among the justifying reasons is that the measure 
has compact support, which will be of limited use in approximation with polynomials on 
an unbounded domain. However, if we sample points so that grids that are progressively 
contracted by distribute according to Pd,Qi then this is a necessary condition for 

controlled Lebesgue constant growth. 

To our knowledge it is unknown whether multidimensional weighted approximate/discrete 
Fekete/Leja arrays distribute according to the weighted pluripotential equilibrium measure. 
However, we anticipate that such a result is true, and if so would provide a powerful 
computational approach for generating optimal unstructured meshes. The one-dimensional 
affirmative answer to this question from |63| gives hope to this possibility. 


6. Summary 

The ability to construct polynomial approximations to functions of high-dimensional inputs 
is of great interest and importance in modern uncertainty quantihcation settings. We have 
explored recent advances in non-intrusive stochastic collocation methods for doing this using 

^The uniform contracted boundedness condition 2a is a technicality. Although we suspect that this 
condition is ultimately unnecessary, a direct proof that 2 => 3 seems to not yet be available. 
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a selection of geometrically unstructured meshes. With respect to N degrees of freedom in 
the polynomial expansion, and M data points, we have presented results for least-squares 
regularization for overdetermined systems (M > N), interpolatory reconstruction for square 
systems (M = N), and compressive sampling/sparse reconstruction for underdetermined 
systems (M < N). 

The results in this held as presented here are far from optimal and settled. We expect a 
great deal of upcoming and future work on the theory of approximation on unstructured 
meshes will make such an approach one of the preferred methods for collocative approaches 
in high-dimensional approximation. 
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